25 research outputs found

    Ethical challenges in the development of virtual assistants powered by large language models

    Get PDF
    Virtual assistants (VAs) have gained widespread popularity across a wide range of applications, and the integration of Large Language Models (LLMs), such as ChatGPT, has opened up new possibilities for developing even more sophisticated VAs. However, this integration poses new ethical issues and challenges that must be carefully considered, particularly as these systems are increasingly used in public services: transfer of personal data, decision-making transparency, potential biases, and privacy risks. This paper, an extension of the work presented at IberSPEECH 2022, analyzes the current regulatory framework for AI-based VAs in Europe and delves into ethical issues in depth, examining potential benefits and drawbacks of integrating LLMs with VAs. Based on the analysis, this paper argues that the development and use of VAs powered by LLMs should be guided by a set of ethical principles that prioritize transparency, fairness, and harm prevention. The paper presents specific guidelines for the ethical use and development of this technology, including recommendations for data privacy, bias mitigation, and user control. By implementing these guidelines, the potential benefits of VAs powered by LLMs can be fully realized while minimizing the risks of harm and ensuring that ethical considerations are at the forefront of the development process.Agencia Gallega de Innovación (GAIN)Xunta de Galicia | Ref. ED431B 2021/2

    SEA_AP: una herramienta de segmentación y etiquetado para el análisis prosódico

    Get PDF
    This paper introduces a tool that performs segmentation and labelling of sound chains in phono units, syllables and/or words departing from a sound signal and its corresponding orthographic transcription. In addition, it also integrates acoustic analysis scripts applied to the Praat programme with the aim of reducing the time spent on tasks related to analysis, correction, smoothing and generation of graphics of the melodic curve. The tool is implemented for Galician, Spanish and Brazilian Portuguese. Our goal is to contribute, by means of this application, to automatize some of the tasks of segmentation, labelling and prosodic analysis, since these tasks require a large investment of time and human resources.En este artículo se presenta una herramienta que realiza la segmentación y el etiquetado de cadenas sonoras en unidades de fono, sílaba y/o palabra partiendo de una señal sonora y de su correspondiente transcripción ortográfica. Además, integra scripts de análisis acústico que se ejecutan sobre el programa Praat con el fin de reducir el tiempo invertido en las tareas de análisis, corrección, suavizado y generación de gráficos de la curva melódica. La herramienta está implementada para gallego, español y portugués de Brasil. Nuestro objetivo es contribuir con esta aplicación a automatizar algunas de las labores de segmentación, etiquetado y análisis prosódico, pues constituyen tareas que requieren una gran inversión de tiempo y de recursos humanos.This work would have not been possible without the help of the Spanish Government (Project ‘SpeechTech4All’ TEC2012-38939-C03-01), the European Regional Development Fund (ERDF), the Government of the Autonomous Community of Galicia (GRC2014/024, “Consolidación de Unidades de Investigación: Proyecto AtlantTIC” CN2012/160) and the “Red de Investigación TecAnDAli” from the Council of Culture, Education and University Planning, Xunta de GaliciaS

    Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

    Full text link
    Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due to the large volume of multimedia information. This paper presents the systems submitted to the ALBAYZIN QbE STD 2014 evaluation held as a part of the ALBAYZIN 2014 Evaluation campaign within the context of the IberSPEECH 2014 conference. This is the second QbE STD evaluation in Spanish, which allows us to evaluate the progress in this technology for this language. The evaluation consists in retrieving the speech files that contain the input queries, indicating the start and end times where the input queries were found, along with a score value that reflects the confidence given to the detection of the query. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from workshops, which amount to about 7 h of speech. We present the database, the evaluation metric, the systems submitted to the evaluation, the results, and compare this second evaluation with the first ALBAYZIN QbE STD evaluation held in 2012. Four different research groups took part in the evaluations held in 2012 and 2014. In 2014, new multi-word and foreign queries were added to the single-word and in-language queries used in 2012. Systems submitted to the second evaluation are hybrid systems which integrate letter transcription- and template matching-based systems. Despite the significant improvement obtained by the systems submitted to this second evaluation compared to those of the first evaluation, results still show the difficulty of this task and indicate that there is still room for improvement.This research was funded by the Spanish Government ('SpeechTech4All Project' TEC2012 38939 C03 01 and 'CMC-V2 Project' TEC2012 37585 C02 01), the Galician Government through the research contract GRC2014/024 (Modalidade: Grupos de Referencia Competitiva 2014) and 'AtlantTIC Project' CN2012/160, and also by the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA

    Study on the impact of the training corpus of the language model on the performance of a speech recognizer

    Get PDF
    Dentro del reconocimiento automático del habla, los modelos de lenguaje estadísticos basados en la probabilidad de secuencia de palabras (n-gramas) suponen uno de los dos pilares sobre los que se basa su correcto funcionamiento. En este trabajo se expone el impacto que tienen sobre las prestaciones de reconocimiento a medida que estos modelos se mejoran con más texto de mejor calidad, cuando estos se ajustan a la aplicación final del sistema, y por lo tanto, cuando se reducen el número de palabras fuera de vocabulario (Out Of Vocabulary - OOV). El reconocedor con los distintos modelos de lenguaje ha sido aplicado sobre cortes de audio correspondientes a tres marcos experimentales: oralidad formal, habla en noticiarios, y TED talks en gallego. Los resultados obtenidos muestran claramente una mejora sobre los marcos experimentales propuestos.Within the automatic speech recognition, statistical language models based on the probability of word sequences (n-grams) represent one of the two pillars on which its correct functioning is based. In this paper, the impact they have on the recognition result is exposed as these models are improved with more text of better quality, when these are adjusted to the final application of the system, and therefore, when the number out of vocabulary (OOV) words is reduced. The recognizer with the different language models has been applied to audio cuts corresponding to three experimental frames: formal orality, talk on newscasts, and TED talks in Galician. The results obtained clearly show an improvement over the experimental frameworks proposed.El trabajo realizado está enmarcado en el proyecto del Plan Nacional TraceThem TEC2015-65345-P y en la red gallega TecAnDaLi ED431D 2016/011 financiada por la Xunta de Galicia. Asimismo se beneficia de las ayudas de la Xunta de Galicia de Grupos de Referencia Competitiva GRC2014/024 y Agrupación Estratéxica Consolidada de Galicia acreditación 2016-2019 y a la Unión Europa a través de los fondos FEDER

    Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

    Get PDF
    [Abstract] The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains 2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully so that several analyses of the main results can be carried out. We present the evaluation itself, the three databases, the evaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analyses based on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, and native/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Three different teams took part in the evaluation, and ten different systems were submitted. The results suggest that the QbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the data domain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.Centro singular de investigación de Galicia; ED431G/04Universidad del País Vasco; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-PMinisterio de Ciencia, Innovación y Competitividad; RTI2018-098091-B-I00Xunta de Galicia; ED431G/0

    ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

    Get PDF
    [Abstract] Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain internationally open evaluation for STD in Spanish. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation task aims at retrieving the speech files that contain the terms, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the COREMAH database, which contains 2-people spontaneous speech conversations about different topics. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the results, and detailed post-evaluation analyses based on some term properties (within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native/foreign terms). Fusion results of the primary systems submitted to the evaluation are also presented. Three different research groups took part in the evaluation, and 11 different systems were submitted. The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.Ministerio de Economía y Competitividad; TIN2015-64282-R,Ministerio de Economía y Competitividad; RTI2018-093336-B-C22Ministerio de Economía y Competitividad; TEC2015-65345-PXunta de Galicia; ED431B 2016/035Xunta de Galicia; GPC ED431B 2019/003Xunta de Galicia; GRC 2014/024Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/04Agrupación estratéxica consolidada; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-

    Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0076-3Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the metric, the systems and the results for the Albayzín-2014 audio segmentation campaign. In contrast to previous evaluations where the task was the segmentation of non-overlapping classes, Albayzín-2014 evaluation proposes the delimitation of the presence of speech, music and/or noise that can be found simultaneously. The database used in the evaluation was created by fusing different media and noises in order to increase the difficulty of the task. Seven segmentation systems from four different research groups were evaluated and combined. Their experimental results were analyzed and compared with the aim of providing a benchmark and showing up the promising directions in this field.This work has been partially funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02 and supported by the European Regional Development Fund and the Spanish Government (‘SpeechTech4All Project’ TEC2012-38939-C03

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)
    corecore